Research
Summary
1.
Introduction
Rising concerns about urban air quality and
its impact on public health have prompted researchers to investigate the
intricate links between environmental factors and health outcomes. This
report provides a thorough analysis of how weather conditions relate to
health risk scores in different cities during September. To achieve
this, we began by developing several SMART questions focused on
understanding how factors like temperature, humidity, and wind speed
affect health risk scores. To reach our goals, we performed a detailed
exploratory data analysis (EDA) to uncover patterns and trends in the
dataset. We also used appropriate statistical tests to validate our
findings and draw meaningful conclusions. This report aims to clarify
the connections between environmental conditions and
health.
2. Dataset
Summary
2.1 Dataset
Overview
The dataset comprises a collection of 27,674 observations related to meteorological and environmental data, including temperature, humidity, and wind speed, recorded across across seven major U.S. cities for the month of September. It contains a total of 43 variables, such as Health Risk Score and Severity Score, providing a detailed view of daily atmospheric conditions.Key variables such as temperature, humidity, wind speed, visibility, and weather descriptions enable an in-depth analysis of how urban air quality affects health.
Data
Loading
## datetime datetimeEpoch tempmax tempmin temp feelslikemax feelslikemin
## 1 07-09-2024 1725692400 89.0 62.1 73.3 88.6 62.1
## 2 08-09-2024 1725778800 89.0 60.0 72.4 87.9 60.0
## 3 10-09-2024 1725951600 79.4 59.6 67.8 79.4 59.6
## 4 11-09-2024 1726038000 77.3 57.6 66.3 77.3 57.6
## 5 12-09-2024 1726124400 79.2 57.8 67.4 79.2 57.8
## 6 13-09-2024 1726210800 83.2 58.9 69.6 82.2 58.9
## feelslike dew humidity precip precipprob precipcover snow snowdepth windgust
## 1 73.3 59.8 66.3 0 0 0 0 0 16.1
## 2 72.3 57.6 62.5 0 0 0 0 0 13.9
## 3 67.8 57.2 70.7 0 0 0 0 0 17.4
## 4 66.3 56.8 73.1 0 4 0 0 0 23.0
## 5 67.4 55.6 68.3 0 5 0 0 0 17.9
## 6 69.5 54.2 60.5 0 0 0 0 0 16.1
## windspeed winddir pressure cloudcover visibility solarradiation solarenergy
## 1 9.2 311.1 1012.2 12.0 10.0 267.7 23.4
## 2 8.1 310.2 1012.1 15.6 9.8 279.0 24.1
## 3 9.8 290.2 1012.5 18.8 12.4 274.7 23.8
## 4 13.4 273.9 1009.6 17.3 15.0 264.0 22.6
## 5 10.7 285.8 1007.0 14.2 15.0 262.2 22.6
## 6 8.9 287.5 1007.4 5.9 15.0 263.2 22.5
## uvindex severerisk sunrise sunriseEpoch sunset sunsetEpoch moonphase
## 1 9 10 06:43:31 1725716611 19:26:34 1725762394 0.16
## 2 9 10 06:44:20 1725803060 19:25:03 1725848703 0.19
## 3 9 10 06:45:59 1725975959 19:22:01 1726021321 0.25
## 4 8 10 06:46:48 1726062408 19:20:29 1726107629 0.29
## 5 8 10 06:47:38 1726148858 19:18:57 1726193937 0.32
## 6 8 10 06:48:27 1726235307 19:17:25 1726280245 0.36
## conditions description icon source City
## 1 Clear Clear conditions throughout the day. clear-day comb San Jose
## 2 Clear Clear conditions throughout the day. clear-day fcst San Jose
## 3 Clear Clear conditions throughout the day. clear-day fcst San Jose
## 4 Clear Clear conditions throughout the day. clear-day fcst San Jose
## 5 Clear Clear conditions throughout the day. clear-day fcst San Jose
## 6 Clear Clear conditions throughout the day. clear-day fcst San Jose
## Temp_Range Heat_Index Severity_Score Month Season Day_of_Week Is_Weekend
## 1 26.9 75.8425 3.41 9 Fall Saturday True
## 2 29.0 75.9270 3.19 9 Fall Sunday True
## 3 19.8 73.5164 3.54 9 Fall Tuesday False
## 4 19.7 72.9060 3.90 9 Fall Wednesday False
## 5 21.4 74.3009 3.39 9 Fall Thursday False
## 6 24.3 75.8192 3.21 9 Fall Friday False
## Health_Risk_Score
## 1 9.84508
## 2 9.58645
## 3 9.85442
## 4 10.14150
## 5 9.74546
## 6 9.52397
Data
Description and Summmary
## [1] "Row Count: 27674 Column Count: 43"
## 'data.frame': 27674 obs. of 43 variables:
## $ datetime : chr "07-09-2024" "08-09-2024" "10-09-2024" "11-09-2024" ...
## $ datetimeEpoch : num 1725692400 1725778800 1725951600 1726038000 1726124400 ...
## $ tempmax : num 89 89 79.4 77.3 79.2 83.2 81.4 78.3 81.2 82.3 ...
## $ tempmin : num 62.1 60 59.6 57.6 57.8 58.9 59.4 59.8 59.3 60.9 ...
## $ temp : num 73.3 72.4 67.8 66.3 67.4 69.6 68.8 66.8 68.9 68.5 ...
## $ feelslikemax : num 88.6 87.9 79.4 77.3 79.2 82.2 81.3 78.3 79.6 80.2 ...
## $ feelslikemin : num 62.1 60 59.6 57.6 57.8 58.9 59.4 59.8 59.3 60.9 ...
## $ feelslike : num 73.3 72.3 67.8 66.3 67.4 69.5 68.8 66.8 68.6 68.4 ...
## $ dew : num 59.8 57.6 57.2 56.8 55.6 54.2 55.5 47.3 44.4 46.6 ...
## $ humidity : num 66.3 62.5 70.7 73.1 68.3 60.5 64.2 52.9 43.5 47.6 ...
## $ precip : num 0 0 0 0 0 0 0 0 0 0 ...
## $ precipprob : num 0 0 0 4 5 0 1 3.2 0 0 ...
## $ precipcover : num 0 0 0 0 0 0 0 0 0 0 ...
## $ snow : int 0 0 0 0 0 0 0 0 0 0 ...
## $ snowdepth : num 0 0 0 0 0 0 0 0 0 0 ...
## $ windgust : num 16.1 13.9 17.4 23 17.9 16.1 16.6 9.8 7.8 8.5 ...
## $ windspeed : num 9.2 8.1 9.8 13.4 10.7 8.9 9.8 8.9 8.1 10.3 ...
## $ winddir : num 311 310 290 274 286 ...
## $ pressure : num 1012 1012 1012 1010 1007 ...
## $ cloudcover : num 12 15.6 18.8 17.3 14.2 5.9 8.9 3.1 8.6 14.2 ...
## $ visibility : num 10 9.8 12.4 15 15 15 14.9 14.9 15 14.9 ...
## $ solarradiation : num 268 279 275 264 262 ...
## $ solarenergy : num 23.4 24.1 23.8 22.6 22.6 22.5 22.3 22 21.7 21.3 ...
## $ uvindex : num 9 9 9 8 8 8 8 8 8 8 ...
## $ severerisk : num 10 10 10 10 10 10 10 10 10 10 ...
## $ sunrise : chr "06:43:31" "06:44:20" "06:45:59" "06:46:48" ...
## $ sunriseEpoch : num 1725716611 1725803060 1725975959 1726062408 1726148858 ...
## $ sunset : chr "19:26:34" "19:25:03" "19:22:01" "19:20:29" ...
## $ sunsetEpoch : num 1725762394 1725848703 1726021321 1726107629 1726193937 ...
## $ moonphase : num 0.16 0.19 0.25 0.29 0.32 0.36 0.39 0.42 0.46 0.5 ...
## $ conditions : chr "Clear" "Clear" "Clear" "Clear" ...
## $ description : chr "Clear conditions throughout the day." "Clear conditions throughout the day." "Clear conditions throughout the day." "Clear conditions throughout the day." ...
## $ icon : chr "clear-day" "clear-day" "clear-day" "clear-day" ...
## $ source : chr "comb" "fcst" "fcst" "fcst" ...
## $ City : chr "San Jose" "San Jose" "San Jose" "San Jose" ...
## $ Temp_Range : num 26.9 29 19.8 19.7 21.4 24.3 22 18.5 21.9 21.4 ...
## $ Heat_Index : num 75.8 75.9 73.5 72.9 74.3 ...
## $ Severity_Score : num 3.41 3.19 3.54 3.9 3.39 3.21 3.26 2.58 2.38 2.45 ...
## $ Month : int 9 9 9 9 9 9 9 9 9 9 ...
## $ Season : chr "Fall" "Fall" "Fall" "Fall" ...
## $ Day_of_Week : chr "Saturday" "Sunday" "Tuesday" "Wednesday" ...
## $ Is_Weekend : chr "True" "True" "False" "False" ...
## $ Health_Risk_Score: num 9.85 9.59 9.85 10.14 9.75 ...
2.2 Data
Source
The dataset is presumed to originate from a weather forecasting or environmental aggregation service and is sourced from the Kaggle website.
2.3 Scope
and SMART Questions
Key variables such as temperature,
humidity, wind speed, visibility, and weather descriptions facilitate a
comprehensive analysis of the effects of urban air quality on health. To
guide our investigation, we have formulated several SMART
questions:
1.How health risk
scores differ between weekdays and weekends?
2.Analyzing how key
meteorological factors vary across different cities.
3.Investigating
the key meteorological factors that significantly influence health risk
scores?
4.How changes in humidity might affect overall health?
5.Examining how wind speed and health risk scores vary over
days.
These questions will help us effectively
explore the connections between environmental conditions and health
outcomes.
3.
Limitations of the Dataset
The dataset has temporal and spatial limitations, as it only contains data from September and a limited number of cities. This may not adequately represent the weather patterns over a broader range of years or the associated health risks. Furthermore, it lacks several important variables that could clarify how health risks were calculated, potentially overlooking significant factors that impact health.
4.
Exploratory Data Analysis
4.1
Appropriate Datatype Conversion
Out of 43 variables, 9 variables are factor, 30 are numeric, and 3 are character.
## datetime datetimeEpoch tempmax tempmin
## "character" "numeric" "numeric" "numeric"
## temp feelslikemax feelslikemin feelslike
## "numeric" "numeric" "numeric" "numeric"
## dew humidity precip precipprob
## "numeric" "numeric" "numeric" "numeric"
## precipcover snow snowdepth windgust
## "numeric" "integer" "numeric" "numeric"
## windspeed winddir pressure cloudcover
## "numeric" "numeric" "numeric" "numeric"
## visibility solarradiation solarenergy uvindex
## "numeric" "numeric" "numeric" "numeric"
## severerisk sunrise sunriseEpoch sunset
## "numeric" "character" "numeric" "character"
## sunsetEpoch moonphase conditions description
## "numeric" "numeric" "factor" "factor"
## icon source City Temp_Range
## "factor" "factor" "factor" "numeric"
## Heat_Index Severity_Score Month Season
## "numeric" "numeric" "factor" "factor"
## Day_of_Week Is_Weekend Health_Risk_Score
## "factor" "factor" "numeric"
## datetime datetimeEpoch tempmax tempmin
## Length:27674 Min. :1725642512 Min. :70.2 Min. :50.5
## Class :character 1st Qu.:1725900770 1st Qu.:77.3 1st Qu.:58.4
## Mode :character Median :1726142273 Median :81.4 Median :61.2
## Mean :1726191240 Mean :80.6 Mean :61.4
## 3rd Qu.:1726440108 3rd Qu.:83.2 3rd Qu.:64.6
## Max. :1726932525 Max. :90.5 Max. :74.9
##
## temp feelslikemax feelslikemin feelslike dew
## Min. :60.1 Min. :68.8 Min. :50.2 Min. :59.3 Min. :41.1
## 1st Qu.:66.5 1st Qu.:77.3 1st Qu.:58.0 1st Qu.:66.6 1st Qu.:47.5
## Median :70.5 Median :81.0 Median :61.0 Median :70.4 Median :53.8
## Mean :69.9 Mean :80.2 Mean :61.4 Mean :70.0 Mean :53.1
## 3rd Qu.:73.4 3rd Qu.:83.1 3rd Qu.:64.7 3rd Qu.:73.3 3rd Qu.:58.5
## Max. :79.7 Max. :90.3 Max. :76.5 Max. :80.1 Max. :65.9
##
## humidity precip precipprob precipcover
## Min. :38.5 Min. :-0.020883 Min. :-5.902 Min. :-1.4813
## 1st Qu.:51.9 1st Qu.:-0.002877 1st Qu.:-0.286 1st Qu.:-0.2819
## Median :57.5 Median : 0.000000 Median : 1.000 Median : 0.0000
## Mean :57.3 Mean : 0.000908 Mean : 2.355 Mean : 0.0497
## 3rd Qu.:63.0 3rd Qu.: 0.005045 3rd Qu.: 3.282 3rd Qu.: 0.3923
## Max. :76.6 Max. : 0.024597 Max. :20.811 Max. : 2.2277
##
## snow snowdepth windgust windspeed winddir
## Min. :0 Min. :0 Min. : 3.50 Min. : 4.89 Min. : 21.5
## 1st Qu.:0 1st Qu.:0 1st Qu.: 9.21 1st Qu.: 8.25 1st Qu.:164.0
## Median :0 Median :0 Median :13.69 Median : 9.20 Median :200.9
## Mean :0 Mean :0 Mean :13.11 Mean : 9.33 Mean :207.5
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:16.28 3rd Qu.:10.43 3rd Qu.:279.4
## Max. :0 Max. :0 Max. :23.42 Max. :15.00 Max. :330.2
##
## pressure cloudcover visibility solarradiation solarenergy
## Min. :1006 Min. :-4.40 Min. : 9.53 Min. :214 Min. :18.7
## 1st Qu.:1012 1st Qu.: 3.43 1st Qu.:11.93 1st Qu.:250 1st Qu.:21.6
## Median :1016 Median :10.26 Median :14.90 Median :259 Median :22.3
## Mean :1016 Mean :11.03 Mean :13.63 Mean :260 Mean :22.5
## 3rd Qu.:1021 3rd Qu.:16.63 3rd Qu.:15.05 3rd Qu.:268 3rd Qu.:23.3
## Max. :1031 Max. :29.63 Max. :15.71 Max. :313 Max. :26.8
##
## uvindex severerisk sunrise sunriseEpoch
## Min. : 5.72 Min. : 8.21 Length:27674 Min. :1725673373
## 1st Qu.: 6.89 1st Qu.: 9.58 Class :character 1st Qu.:1725957500
## Median : 7.98 Median :10.00 Mode :character Median :1726170046
## Mean : 7.68 Mean :10.06 Mean :1726215696
## 3rd Qu.: 8.37 3rd Qu.:10.63 3rd Qu.:1726451108
## Max. :10.00 Max. :12.06 Max. :1726990562
##
## sunset sunsetEpoch moonphase
## Length:27674 Min. :1725711455 Min. :0.141
## Class :character 1st Qu.:1725975203 1st Qu.:0.231
## Mode :character Median :1726209348 Median :0.325
## Mean :1726260189 Mean :0.346
## 3rd Qu.:1726496196 3rd Qu.:0.450
## Max. :1727027186 Max. :0.647
##
## conditions description
## Clear :22826 Becoming cloudy in the afternoon. : 404
## Partially cloudy: 4848 Clear conditions throughout the day.:22422
## Clearing in the afternoon. : 404
## Partly cloudy throughout the day. : 4444
##
##
##
## icon source City Temp_Range
## clear-day :22826 comb: 505 San Jose :7777 Min. : 8.1
## partly-cloudy-day: 4848 fcst:27169 New York City:6060 1st Qu.:16.9
## Philadelphia :3939 Median :19.8
## Chicago :3838 Mean :19.3
## Los Angeles :3838 3rd Qu.:21.7
## Dallas : 909 Max. :29.8
## (Other) :1313
## Heat_Index Severity_Score Month Season Day_of_Week
## Min. :72.5 Min. :1.85 9:27674 Fall:27674 Friday :3737
## 1st Qu.:76.1 1st Qu.:2.44 Monday :4343
## Median :77.0 Median :2.72 Saturday :3333
## Mean :77.1 Mean :2.85 Sunday :5858
## 3rd Qu.:78.0 3rd Qu.:3.23 Thursday :3535
## Max. :81.6 Max. :4.32 Tuesday :2727
## Wednesday:4141
## Is_Weekend Health_Risk_Score
## False:18483 Min. : 8.41
## True : 9191 1st Qu.: 9.06
## Median : 9.28
## Mean : 9.34
## 3rd Qu.: 9.59
## Max. :10.70
##
4.2
Dropping unwanted columns
Dropping season , snow, snowdepth and month columns as they have single value.
## datetime datetimeEpoch tempmax tempmin
## 15 262 268 264
## temp feelslikemax feelslikemin feelslike
## 270 270 264 271
## dew humidity precip precipprob
## 272 269 234 243
## precipcover snow snowdepth windgust
## 234 1 1 261
## windspeed winddir pressure cloudcover
## 252 273 271 271
## visibility solarradiation solarenergy uvindex
## 242 274 260 238
## severerisk sunrise sunriseEpoch sunset
## 234 45 274 44
## sunsetEpoch moonphase conditions description
## 274 250 2 4
## icon source City Temp_Range
## 2 2 9 269
## Heat_Index Severity_Score Month Season
## 274 268 1 1
## Day_of_Week Is_Weekend Health_Risk_Score
## 7 2 27674
## datetime datetimeEpoch tempmax tempmin temp feelslikemax feelslikemin
## 1 07-09-2024 1725692400 89.0 62.1 73.3 88.6 62.1
## 2 08-09-2024 1725778800 89.0 60.0 72.4 87.9 60.0
## 3 10-09-2024 1725951600 79.4 59.6 67.8 79.4 59.6
## 4 11-09-2024 1726038000 77.3 57.6 66.3 77.3 57.6
## 5 12-09-2024 1726124400 79.2 57.8 67.4 79.2 57.8
## 6 13-09-2024 1726210800 83.2 58.9 69.6 82.2 58.9
## feelslike dew humidity precip precipprob precipcover windgust windspeed
## 1 73.3 59.8 66.3 0 0 0 16.1 9.2
## 2 72.3 57.6 62.5 0 0 0 13.9 8.1
## 3 67.8 57.2 70.7 0 0 0 17.4 9.8
## 4 66.3 56.8 73.1 0 4 0 23.0 13.4
## 5 67.4 55.6 68.3 0 5 0 17.9 10.7
## 6 69.5 54.2 60.5 0 0 0 16.1 8.9
## winddir pressure cloudcover visibility solarradiation solarenergy uvindex
## 1 311.1 1012.2 12.0 10.0 267.7 23.4 9
## 2 310.2 1012.1 15.6 9.8 279.0 24.1 9
## 3 290.2 1012.5 18.8 12.4 274.7 23.8 9
## 4 273.9 1009.6 17.3 15.0 264.0 22.6 8
## 5 285.8 1007.0 14.2 15.0 262.2 22.6 8
## 6 287.5 1007.4 5.9 15.0 263.2 22.5 8
## severerisk sunrise sunriseEpoch sunset sunsetEpoch moonphase conditions
## 1 10 06:43:31 1725716611 19:26:34 1725762394 0.16 Clear
## 2 10 06:44:20 1725803060 19:25:03 1725848703 0.19 Clear
## 3 10 06:45:59 1725975959 19:22:01 1726021321 0.25 Clear
## 4 10 06:46:48 1726062408 19:20:29 1726107629 0.29 Clear
## 5 10 06:47:38 1726148858 19:18:57 1726193937 0.32 Clear
## 6 10 06:48:27 1726235307 19:17:25 1726280245 0.36 Clear
## description icon source City Temp_Range
## 1 Clear conditions throughout the day. clear-day comb San Jose 26.9
## 2 Clear conditions throughout the day. clear-day fcst San Jose 29.0
## 3 Clear conditions throughout the day. clear-day fcst San Jose 19.8
## 4 Clear conditions throughout the day. clear-day fcst San Jose 19.7
## 5 Clear conditions throughout the day. clear-day fcst San Jose 21.4
## 6 Clear conditions throughout the day. clear-day fcst San Jose 24.3
## Heat_Index Severity_Score Day_of_Week Is_Weekend Health_Risk_Score
## 1 75.8425 3.41 Saturday True 9.84508
## 2 75.9270 3.19 Sunday True 9.58645
## 3 73.5164 3.54 Tuesday False 9.85442
## 4 72.9060 3.90 Wednesday False 10.14150
## 5 74.3009 3.39 Thursday False 9.74546
## 6 75.8192 3.21 Friday False 9.52397
## [1] 27674 39
4.3
Duplicates and missing values removal
No missing values and duplicate rows
## [1] "Number of duplicate rows: 0"
## datetime datetimeEpoch tempmax tempmin
## 0 0 0 0
## temp feelslikemax feelslikemin feelslike
## 0 0 0 0
## dew humidity precip precipprob
## 0 0 0 0
## precipcover windgust windspeed winddir
## 0 0 0 0
## pressure cloudcover visibility solarradiation
## 0 0 0 0
## solarenergy uvindex severerisk sunrise
## 0 0 0 0
## sunriseEpoch sunset sunsetEpoch moonphase
## 0 0 0 0
## conditions description icon source
## 0 0 0 0
## City Temp_Range Heat_Index Severity_Score
## 0 0 0 0
## Day_of_Week Is_Weekend Health_Risk_Score
## 0 0 0
4.4
Outliers Removal
After removing the outliers, we are left with 18885 observations
## datetime datetimeEpoch tempmax tempmin temp feelslikemax feelslikemin
## 1 07-09-2024 1725692400 89.0 62.1 73.3 88.6 62.1
## 3 10-09-2024 1725951600 79.4 59.6 67.8 79.4 59.6
## 5 12-09-2024 1726124400 79.2 57.8 67.4 79.2 57.8
## 6 13-09-2024 1726210800 83.2 58.9 69.6 82.2 58.9
## 7 14-09-2024 1726297200 81.4 59.4 68.8 81.3 59.4
## 8 15-09-2024 1726383600 78.3 59.8 66.8 78.3 59.8
## feelslike dew humidity precip precipprob precipcover windgust windspeed
## 1 73.3 59.8 66.3 0 0.0 0 16.1 9.2
## 3 67.8 57.2 70.7 0 0.0 0 17.4 9.8
## 5 67.4 55.6 68.3 0 5.0 0 17.9 10.7
## 6 69.5 54.2 60.5 0 0.0 0 16.1 8.9
## 7 68.8 55.5 64.2 0 1.0 0 16.6 9.8
## 8 66.8 47.3 52.9 0 3.2 0 9.8 8.9
## winddir pressure cloudcover visibility solarradiation solarenergy uvindex
## 1 311.1 1012.2 12.0 10.0 267.7 23.4 9
## 3 290.2 1012.5 18.8 12.4 274.7 23.8 9
## 5 285.8 1007.0 14.2 15.0 262.2 22.6 8
## 6 287.5 1007.4 5.9 15.0 263.2 22.5 8
## 7 263.1 1010.3 8.9 14.9 259.1 22.3 8
## 8 256.5 1011.5 3.1 14.9 255.1 22.0 8
## severerisk sunrise sunriseEpoch sunset sunsetEpoch moonphase conditions
## 1 10 06:43:31 1725716611 19:26:34 1725762394 0.16 Clear
## 3 10 06:45:59 1725975959 19:22:01 1726021321 0.25 Clear
## 5 10 06:47:38 1726148858 19:18:57 1726193937 0.32 Clear
## 6 10 06:48:27 1726235307 19:17:25 1726280245 0.36 Clear
## 7 10 06:49:17 1726321757 19:15:53 1726366553 0.39 Clear
## 8 10 06:50:06 1726408206 19:14:21 1726452861 0.42 Clear
## description icon source City Temp_Range
## 1 Clear conditions throughout the day. clear-day comb San Jose 26.9
## 3 Clear conditions throughout the day. clear-day fcst San Jose 19.8
## 5 Clear conditions throughout the day. clear-day fcst San Jose 21.4
## 6 Clear conditions throughout the day. clear-day fcst San Jose 24.3
## 7 Clear conditions throughout the day. clear-day fcst San Jose 22.0
## 8 Clear conditions throughout the day. clear-day fcst San Jose 18.5
## Heat_Index Severity_Score Day_of_Week Is_Weekend Health_Risk_Score
## 1 75.8425 3.41 Saturday True 9.84508
## 3 73.5164 3.54 Tuesday False 9.85442
## 5 74.3009 3.39 Thursday False 9.74546
## 6 75.8192 3.21 Friday False 9.52397
## 7 75.1635 3.26 Saturday True 9.60927
## 8 77.5603 2.58 Sunday True 9.12085
## [1] "Row Count: 18885 Column Count: 39"
4.5
Univariate Analysis
Most of the population has health risk scores around 9.0, with a small subset showing elevated scores near 10.0.
## [1] "The majority of the population has health risk scores around 9.0, with a small subset showing elevated scores near 10.0."
4.6
Bivariate and Multivariate Analysis
There is a significant difference in Health Risk Scores between weekends and weekdays. Scores are generally higher on weekends than on weekdays, indicating increased health risks during weekends
## [1] "Is there a statistically significant difference in the Health Risk Score on weekends compared to weekdays?"
## Hypothesis Statements:
## Null Hypothesis (H0): There is no significant difference in the Health Risk Score between weekends and weekdays.
## Alternative Hypothesis (H1): There is a significant difference in the Health Risk Score between weekends and weekdays.
##
## Welch Two Sample t-test
##
## data: weekend_scores and weekday_scores
## t = 29.51, df = 5925, p-value <2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.175105 0.200027
## sample estimates:
## mean of x mean of y
## 9.36019 9.17262
##
## Interpretation:
## Reject the null hypothesis: There is a statistically significant difference in the Health Risk Score between weekends and weekdays.
## [1] "The weak negative correlation of -0.24 suggests that higher temperatures are slightly associated with lower health risk scores, though the relationship is not strong"
San Jose has the highest health risk score and wind gusts, while Philadelphia shows the lowest health risk score and Los Angeles the lowest wind gusts.
Visualize the strength and direction of relationships (correlations) between pairs of variables in a dataset.
Wind gusts have a positive impact on health risk, whereas temperature has a weaker, slightly negative relationship with health risk scores.Wind-related factors (gusts and speed) and humidity increase health risk scores, while factors like visibility, sunrise/sunset times, and moon phases reduce them.
## [1] "Does wind gust have a significant impact on the Health Risk Score?"
## [1] "Correlation between windgust and Health risk : 0.719211489784877"
## [1] "Correlation between severity score and Health risk : 0.79812942759995"
## [1] "Correlation between wind speed and Health risk : 0.586553375240766"
## [1] "Correlation between humidity and Health risk : 0.505210350812946"
## [1] "Correlation between humidity and windgust : 0.505210350812946"
## [1] "Correlation between visibility and Health risk : 0.861818410192233"
## [1] "Correlation between sunsetEpoch and Health risk : 0.79812942759995"
## [1] "Correlation between sunriseEpoch and Health risk : 0.586553375240766"
## [1] "Correlation between moonphase and Health risk : 0.586553375240766"
## # A tibble: 5 × 6
## City Avg_Dew_Point Avg_Humidity Avg_Pressure Total_Precipitation pressure_a
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Chicago 52.4 52.2 1018. 1.45 3391525.
## 2 Los An… 45.8 50.6 1012. 10.2 3678768.
## 3 New Yo… 57.0 60.6 1021. 0.232 3590429.
## 4 Philad… 59.0 63.2 1022. 2.99 2374491.
## 5 San Jo… 51.9 59.3 1010. -1.52 6143571.
Both health risk and wind gust scores fluctuate similarly over time, peaking around September 8th and declining sharply by mid- September. Health risk score is being influenced by windgust.
## [1] "How does the windgust vary over dates?"
## # A tibble: 6 × 2
## datetime Average_HRS
## <date> <dbl>
## 1 2024-09-07 15.9
## 2 2024-09-08 17.5
## 3 2024-09-09 15.6
## 4 2024-09-10 15.2
## 5 2024-09-11 11.2
## 6 2024-09-12 13.4
## [1] "How does the Health Risk Score vary over dates?"
## # A tibble: 6 × 2
## datetime Average_HRS
## <date> <dbl>
## 1 2024-09-07 9.83
## 2 2024-09-08 9.80
## 3 2024-09-09 9.22
## 4 2024-09-10 9.42
## 5 2024-09-11 9.01
## 6 2024-09-12 9.30
## # A tibble: 6 × 3
## # Groups: City [2]
## City datetime Average_HRS
## <fct> <date> <dbl>
## 1 Chicago 2024-09-08 9.75
## 2 Chicago 2024-09-09 9.14
## 3 Chicago 2024-09-10 9.06
## 4 Chicago 2024-09-11 9.15
## 5 Chicago 2024-09-12 9.26
## 6 Los Angeles 2024-09-15 8.72
5.
Conclusion and EDA Insights
Health Risk Variation: Health risk scores are higher on weekends compared to weekdays, indicating greater risks during weekends.
City Comparison: San Jose has the highest health risk and wind gusts, while Philadelphia shows the lowest health risk and wind gusts.
Meteorological Impact: Wind gusts significantly increase health risk scores, while temperature has a weaker influence.
Humidity and Health: Humidity was found to have a moderate positive correlation with health risk scores.
Key Factors: Wind-related factors increase health risks, while visibility and sunrise/sunset times lowers them indirectly.
6. Further
Research and Limitations
Standardize and screen health risk data sources and expand the research cities and scope
Conducting time-series analyses to explore the relationship between seasonal meteorological factors and health risks.
Developing strategies to mitigate the adverse health impacts of extreme weather conditions, especially in areas with high wind or humidity levels.
7.
References